neuron function
A Concept uniqueness and granularity
Here, we report statistics about the uniqueness of neuron concepts, as we increase the maximum formula length of our explanations. Figure S1: Number of repeated concepts across probed vision and NLI models, by maximum formula length. Table S1: For probed Image Classification and NLI models, average number of occurrences of each detected concept and percentage of detected concepts that are unique (i.e. A.1 Image Classification Figure S1 (left) plots the number of times each unique concept appears across the 512 units of ResNet-18 as the maximum formula length increases. Table S1 displays the mean number of occurrences per concept, and percentage of concepts occurring that are unique (i.e.
Effective Rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics
Yang, Jiang, Zhao, Yuxiang, Zhu, Quanhui
In recent years, deep learning, powered by neural networks, has achieved widespread success in solving high-dimensional problems, particularly those with low-dimensional feature structures. This success stems from their ability to identify and learn low dimensional features tailored to the problems. Understanding how neural networks extract such features during training dynamics remains a fundamental question in deep learning theory. In this work, we propose a novel perspective by interpreting the neurons in the last hidden layer of a neural network as basis functions that represent essential features. To explore the linear independence of these basis functions throughout the deep learning dynamics, we introduce the concept of 'effective rank'. Our extensive numerical experiments reveal a notable phenomenon: the effective rank increases progressively during the learning process, exhibiting a staircase-like pattern, while the loss function concurrently decreases as the effective rank rises. We refer to this observation as the 'staircase phenomenon'. Specifically, for deep neural networks, we rigorously prove the negative correlation between the loss function and effective rank, demonstrating that the lower bound of the loss function decreases with increasing effective rank. Therefore, to achieve a rapid descent of the loss function, it is critical to promote the swift growth of effective rank. Ultimately, we evaluate existing advanced learning methodologies and find that these approaches can quickly achieve a higher effective rank, thereby avoiding redundant staircase processes and accelerating the rapid decline of the loss function.
Rethinking the Function of Neurons in KANs
The neurons of Kolmogorov-Arnold Networks (KANs) perform a simple summation motivated by the Kolmogorov-Arnold representation theorem, which asserts that sum is the only fundamental multivariate function. In this work, we investigate the potential for identifying an alternative multivariate function for KAN neurons that may offer increased practical utility. Our empirical research involves testing various multivariate functions in KAN neurons across a range of benchmark Machine Learning tasks. Our findings indicate that substituting the sum with the average function in KAN neurons results in significant performance enhancements compared to traditional KANs. Our study demonstrates that this minor modification contributes to the stability of training by confining the input to the spline within the effective range of the activation function. Our implementation and experiments are available at: \url{https://github.com/Ghaith81/dropkan}
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- Europe > Sweden > Halland County > Halmstad (0.04)